14 research outputs found
Generalized Clusterwise Regression for Simultaneous Estimation of Optimal Pavement Clusters and Performance Models
The existing state-of-the-art approach of Clusterwise Regression (CR) to estimate pavement performance models (PPMs) pre-specifies explanatory variables without testing their significance; as an input, this approach requires the number of clusters for a given data set. Time-consuming ‘trial and error’ methods are required to determine the optimal number of clusters. A common objective function is the minimization of the total sum of squared errors (SSE). Given that SSE decreases monotonically as a function of the number of clusters, the optimal number of clusters with minimum SSE always is the total number of data points. Hence, the minimization of SSE is not the best objective function to seek for an optimal number of clusters.
In previous studies, the PPMs were restricted to be either linear or nonlinear, irrespective of which functional form provided the best results. The existing mathematical programming formulations did not include constraints that ensured the minimum number of observations required in each cluster to achieve statistical significance. In addition, a pavement sample could be associated with multiple performance models. Hence, additional modeling was required to combine the results from multiple models.
To address all these limitations, this research proposes a generalized CR that simultaneously 1) finds the optimal number of pavement clusters, 2) assigns pavement samples into clusters, 3) estimates the coefficients of cluster-specific explanatory variables, and 4) determines the best functional form between linear and nonlinear models. Linear and nonlinear functional forms were investigated to select the best model specification. A mixed-integer nonlinear mathematical program was formulated with the Bayesian Information Criteria (BIC) as the objective function. The advantage of using BIC is that it penalizes for including additional parameters (i.e., number of clusters and/or explanatory variables). Hence, the optimal CR models provided a balance between goodness of fit and model complexity. In addition, the search process for the best model specification using BIC has the property of consistency, which asymptotically selects this model with a probability of ‘1’.
Comprehensive solution algorithms – Simulated Annealing coupled with Ordinary Least Squares for linear models and All Subsets Regression for nonlinear models – were implemented to solve the proposed mathematical problem. The algorithms selected the best model specification for each cluster after exploring all possible combinations of potentially significant explanatory variables. Potential multicollinearity issues were investigated and addressed as required.
Variables identified as significant explanatory variables were average daily traffic, pavement age, rut depth along the pavement, annual average precipitation and minimum temperature, road functional class, prioritization category, and the number of lanes. All these variables were considered in the literature as the most critical factors for pavement deterioration.
In addition, the predictive capability of the estimated models was investigated. The results showed that the models were robust without any overfitting issues, and provided small prediction errors. The models developed using the proposed approach provided superior explanatory power compared to those that were developed using the existing state-of-the-art approach of clusterwise regression. In particular, for the data set used in this research, nonlinear models provided better explanatory power than did the linear models. As expected, the results illustrated that different clusters might require different explanatory variables and associated coefficients. Similarly, determining the optimal number of clusters while estimating the corresponding PPMs contributed significantly to reduce the estimation error
Effects on compliance of a HAWK signal in Las Vegas
In 2010, 806 crashes involving pedestrians occurred in Nevada; 36 were fatalities and 796 were injuries. Although numerous pedestrian safety countermeasures exist in Las Vegas, NV it was ranked as the 6th most dangerous large metropolitan area in the U.S. So, additional and more effective safety countermeasures were required to reduce pedestrian crashes in Las Vegas. High-intensity Activated crossWalK (HAWK) signal has been identified as a potential mechanism to reduce crashes. This study evaluates the effectiveness of such signal installed at E. Sahara Avenue, Las Vegas. Data was collected from videos captured by two cameras facing eastbound and westbound for two weeks; one week each for before and after operation of the signal. Statistical analyses (descriptive analysis and t-test) were performed considering different performance measures such as pedestrian waiting time at the curb. On an average, jaywalking occurrences dropped significantly from 32.6% to 8.2% and the total crossing time decreased by 5.3 seconds. In addition, motorist compliance, yielding to pedestrians attempting to cross the street, improved with 6.9% fewer non-yielding vehicles
Recommended from our members
Measuring fidelity, feasibility, costs: an implementation evaluation of a cluster-controlled trial of group antenatal care in rural Nepal
Background
Access to high-quality antenatal care services has been shown to be beneficial for maternal and child health. In 2016, the WHO published evidence-based recommendations for antenatal care that aim to improve utilization, quality of care, and the patient experience. Prior research in Nepal has shown that a lack of social support, birth planning, and resources are barriers to accessing services in rural communities. The success of CenteringPregnancy and participatory action women’s groups suggests that group care models may both improve access to care and the quality of care delivered through women’s empowerment and the creation of social networks. We present a group antenatal care model in rural Nepal, designed and implemented by the healthcare delivery organization Nyaya Health Nepal, as well as an assessment of implementation outcomes.
Methods
The study was conducted at Bayalata Hospital in Achham, Nepal, via a public private partnership between the Nepali non-profit, Nyaya Health Nepal, and the Ministry of Health and Population, with financial and technical assistance from the American non-profit, Possible. We implemented group antenatal care as a prospective non-randomized cluster-controlled, type I hybrid effectiveness-implementation study in six village clusters. The implementation approach allows for iterative improvement in design, making changes to improve the quality of the intervention. Assessments of implementation process and model fidelity were undertaken using a mobile checklist completed by nurse supervisors, and observation forms completed by program leadership. We evaluated data quarterly using descriptive statistics to identify trends. Qualitative interviews and team communications were analyzed through immersion crystallization to identify major themes that evolved during the implementation process.
Results
A total of 141 group antenatal sessions were run during the study period. This paper reports on implementation results, whereas we analyze and present patient-level effectiveness outcomes in a complementary paper in this journal. There was high process fidelity to the model, with 85.7% (95% CI 77.1–91.5%) of visits completing all process elements, and high content fidelity, with all village clusters meeting the minimum target frequency for 80% of topics. The annual per capita cost for group antenatal care was 0.50 USD. Qualitative analysis revealed the compromise of stable gestation-matched composition of the group members in order to make the intervention feasible. Major adaptations were made in training, documentation, feedback and logistics.
Conclusion
Group antenatal care provided in collaboration with local government clinics has the potential to provide accessible and high quality antenatal care to women in rural Nepal. The intervention is a feasible and affordable alternative to individual antenatal care. Our experience has shown that adaptation from prior models was important for the program to be successful in the local context within the national healthcare system.
Trial registration
ClinicalTrials.gov Identifier: NCT02330887, registered 01/05/2015, retroactively registered
Estimation of optimal pavement performance models for highways
A mathematical program is proposed to determine an optimum number of pavement clusters, memberships of the pavement samples to clusters, and associated significant explanatory variables. Simulated annealing and all subsets regression was used to solve the mathematical program. Potential multicollinearity issues were exam-ined and addressed. All possible combinations of the explanatory variables were explored to select the best model specification. Six-cluster models were determined to be the optimum solution for the dataset used in this research. The resultant models were applied to the test data set to examine the prediction accuracy. Nor-malized root-mean-square error was calculated for each of the resultant models. The associated models were robust with small prediction errors
Business intelligence for transportation and infrastructure systems
This study illustrates the advantage of using a business intelligence (BI) approach for the analysis and processing of transportation and infrastructure data. As a case study, a data warehouse, interactive dashboards including maps, and advanced analytics were created for data from the Pavement Management System (PMS) of the Nevada Department of Transportation (NDOT). The combination of all these capabilities in one single platform enables to maximize the value of the available data
Comprehensive clusterwise linear regression for pavement management systems
A comprehensive mathematical program was formulated to determine simultaneously (1) an optimum number of pavement clusters, (2) cluster memberships of pavement samples, (3) cluster-specific significant explanatory variables, and (4) estimated regression coefficients for pavement performance models (PPMs). Simulated annealing coupled with all-subset regression was proposed to solve the mathematical programming. The proposed algorithm was capable of identifying and addressing potential multicollinearity issues. All possible combinations of the explanatory variables were examined to select the best model that provided a balance among (1) the number of PPMs; (2) the number of explanatory variables; (3) the resources required to develop, maintain, and use these models; and (4) the explanatory power. For the data set used in this research, six-cluster models were determined as part of the optimum solution. The predictive capabilities of the resultant models were investigated, and results showed that the models provided few prediction errors without any overfitting issues
Limitations of existing pavement deterioration models and a potential solution
The state of the art currently for addressing pavement deterioration proposes the development of Pavement Deterioration Models, using a clusterwise approach that requires a priori knowledge of the optimal number of clusters as well as significant explanatory variables. In addition, the objective function used to solve the clusterwise problem is the minimization of the sum of squared errors, which always decreases with additional cluster(s) and/or explanatory variable(s). To address these limitations, a mathematical programming framework is proposed based on the Bayesian Information Criterion, which does not require a priori information about the optimal number of clusters. An extensive optimization approach was used to find a solution to the proposed mathematical program, and issues associated with overfitting were investigated. Results using data from the entire State of Nevada illustrate the advantage of the proposed framework
Generalised clusterwise regression for simultaneous estimation of optimal pavement clusters and performance models
This paper focuses on clusterwise regression (CR) approach for modelling of pavement performance. CR simultaneously clusters the data and estimates the associated models. Previous studies using CR approach have a few limitations: (1) the explanatory power of variables used in the analyses was not tested; (2) the approach could not find the optimal number of clusters; (3) the objective function was to minimise the sum of squared errors, which is not the best to seek for the optimal number of clusters; (4) the model functional form was restricted to be either linear or nonlinear. To address these limitations, this paper proposes a generalised mathematical programme and solution algorithm within the CR framework. Bayesian Information Criteria was used as the objective function. The proposed approach explored all possible combinations of potential significant explanatory variables to select the best model specification. The potential multicollinearity issues in the models were addressed if required. Both linear and nonlinear functional forms were estimated using a large dataset in Nevada. Predictive accuracy of the resultant models was evaluated using root-mean-square error (RMSE), normalised RMSE, and mean absolute errors. The results showed that the nonlinear models were more accurate than the linear models in estimating present serviceability index
Evaluation of the effectiveness of a HAWK signal on compliance in Las Vegas Nevada
There is a continuous large number of crashes involving pedestrians in Nevada despite the numerous safety mechanisms currently used at roadway crossings. Hence, additional as well as more effective mechanisms are required to reduce crashes in Las Vegas, in particular, and Nevada in general. A potential mechanism to reduce conflicts between pedestrians and vehicles is a High-intensity Activated crossWalK (HAWK) signal. This study evaluates the effects of such signals at a particular site in Las Vegas. Video data were collected using two cameras, facing the eastbound and westbound traffic. One week of video data before and after the deployment of the signal were collected to capture the behavior of both pedestrians and drivers. T-test analyses of pedestrian waiting time at the curb, curb-to-curb crossing time, total crossing time, jaywalking events, and near-crash events show that the HAWK system provides significant benefits
A clusterwise regression approach for the estimation of crash frequencies
In the current literature, data is aggregated for the estimation of functions to explain or predict crash patterns using either clustering analysis, regression analysis, or stage-wise models. Typically, analysis sites are grouped into site subtypes based on predefined characteristics. The assumption is that sites within each subtype experience similar crash patterns as a function of prespecified explanatory characteristics. To develop functions to estimate crashes, all data points are clustered only as a function of associated site characteristics. As a consequence, estimated parameters may be based on different crash patterns that represents various trends that could be better captured by using multiple functions. To address this limitation, this study proposes a mathematical program utilizing clusterwise regression to assign sites to clusters and simultaneously seek sets of parameter values for the corresponding estimation functions, so as to maximize the probability of observing the available data. A simulated annealing, coupled with maximum likelihood estimation, was used to solve the mathematical program. Results were analyzed for two site subtypes with fatal and all injury crashes: (1) roadway segments for urban multilane divided segments, and; (2) urban four-leg signalized intersections. Clusterwise regression improved the predicted number of crashes with multiple estimation functions within the same site subtype